Trimming (computer programming)

In computer programming, trimming (trim) or stripping (strip) is a string manipulation in which leading and trailing whitespace is removed from a string.

For example, the string (enclosed by apostrophes)

'  this  is a test  '

would be changed, after trimming, to

'this  is a test'

Contents

Variants

The most popular variants of the trim function strip only the beginning or end of the string. Typically named ltrim and rtrim respectively, or in the case of Python: lstrip and rstrip. C# uses TrimStart and TrimEnd, and Common Lisp string-left-trim and string-right-trim. Pascal and Java do not have these variants built-in, although Object Pascal (Delphi) has TrimLeft and TrimRight functions[1].

Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's string-trim function, the parameter (called character-bag) is required. The C++ Boost library defines space characters according to locale, as well as offering variants with a predicate parameter (a functor) to select which characters are trimmed.

An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, Apache Jakarta's StringUtils has a function called stripToNull which returns null in place of an empty string.

An alternative to trimming a string is space normalization, where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is done by Trim() in spreadsheet applications (including Excel, Calc, Gnumeric, and Google Docs), and by the normalize-space() function in XSLT and XPath,

While most algorithms return a new (trimmed) string, some alter the original string in-place. Notably, the Boost library allows either in-place trimming or a trimmed copy to be returned.

Definition of whitespace

The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support Unicode typically include all Unicode space characters. Some implementations also include ASCII control codes (non-printing characters) along with whitespace characters.

Java's trim method considers ASCII spaces and control codes as whitespace, while Java's isWhitespace() method recognizes Unicode space characters.

Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.

Usage

Following are examples of trimming a string using several programming languages. All of the implementations shown return a new string and do not alter the original variable.

Example usage Languages
String.Trim([chars]) C#, VB.NET, Windows PowerShell
string.strip(); D
(.trim string) Clojure
sequence [ predicate? ] trim Factor
(string-trim '(#\Space #\Tab #\Newline) string) Common Lisp
(string-trim string) Scheme
string.trim() Java, JavaScript (1.8.1+, Firefox 3.5+)
Trim(String) Pascal [2], QBasic, Visual Basic, Delphi
string.strip() Python
strings.Trim(string, chars) Go
LTRIM(RTRIM(String)) Oracle SQL, T-SQL
strip(string [,option , char]) REXX
string:strip(string [,option , char]) Erlang
string.strip Ruby
(string =~ /\S(.*\S)?/s, $&) Perl 5
string.trim Perl 6
trim(string) PHP
[string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] Objective-C using Cocoa
string withBlanksTrimmed Smalltalk (Squeak, Pharo)
strip(string) SAS
string trim $string Tcl
TRIM(string) or TRIM(ADJUSTL(string)) Fortran
TRIM(string) SQL
TRIM(string) or LTrim(string) or RTrim(String) ColdFusion

Other languages

In languages without a built-in trim function, it is usually simple to create a custom function which accomplishes the same task.

AWK

In AWK, one can use regular expressions to trim:

GeSHi Error: GeSHi could not find the language awk (using path /usr/share/php-geshi/geshi/) (code 2)

You need to specify a language like this: <source lang="html4strict">...</source>

Supported languages for syntax highlighting:

abap, actionscript, actionscript3, ada, apache, applescript, apt_sources, asm, asp, autoit, avisynth, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, c_mac, caddcl, cadlisp, cfdg, cfm, cil, cmake, cobol, cpp, cpp-qt, csharp, css, d, dcs, delphi, diff, div, dos, dot, eiffel, email, erlang, fo, fortran, freebasic, genero, gettext, glsl, gml, gnuplot, groovy, haskell, hq9plus, html4strict, idl, ini, inno, intercal, io, java, java5, javascript, kixtart, klonec, klonecpp, latex, lisp, locobasic, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, make, matlab, mirc, modula3, mpasm, mxml, mysql, nsis, oberon2, objc, ocaml, ocaml-brief, oobas, oracle11, oracle8, pascal, per, perl, php, php-brief, pic16, pixelbender, plsql, povray, powershell, progress, prolog, properties, providex, python, qbasic, rails, rebol, reg, robots, ruby, sas, scala, scheme, scilab, sdlbasic, smalltalk, smarty, sql, tcl, teraterm, text, thinbasic, tsql, typoscript, vb, vbnet, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xml, xorg_conf, xpp, z80

or:

GeSHi Error: GeSHi could not find the language awk (using path /usr/share/php-geshi/geshi/) (code 2)

You need to specify a language like this: <source lang="html4strict">...</source>

Supported languages for syntax highlighting:

abap, actionscript, actionscript3, ada, apache, applescript, apt_sources, asm, asp, autoit, avisynth, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, c_mac, caddcl, cadlisp, cfdg, cfm, cil, cmake, cobol, cpp, cpp-qt, csharp, css, d, dcs, delphi, diff, div, dos, dot, eiffel, email, erlang, fo, fortran, freebasic, genero, gettext, glsl, gml, gnuplot, groovy, haskell, hq9plus, html4strict, idl, ini, inno, intercal, io, java, java5, javascript, kixtart, klonec, klonecpp, latex, lisp, locobasic, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, make, matlab, mirc, modula3, mpasm, mxml, mysql, nsis, oberon2, objc, ocaml, ocaml-brief, oobas, oracle11, oracle8, pascal, per, perl, php, php-brief, pic16, pixelbender, plsql, povray, powershell, progress, prolog, properties, providex, python, qbasic, rails, rebol, reg, robots, ruby, sas, scala, scheme, scilab, sdlbasic, smalltalk, smarty, sql, tcl, teraterm, text, thinbasic, tsql, typoscript, vb, vbnet, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xml, xorg_conf, xpp, z80

C/C++

There is no standard trim function in C or C++. Most of the available string libraries[3] for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called EatWhitespace in some, non-standard C libraries.

The open source C++ library Boost has several trim variants, including a standard one: [4]

#include <boost/algorithm/string/trim.hpp>
trimmed = boost::algorithm::trim_copy("string");

Note that with boost's function named simply trim the input sequence is modified in-place[5], and does not return a result.

Another open source C++ library Qt has several trim variants, including a standard one: [6]

#include <QString>
trimmed = s.trimmed();

The Linux kernel also includes a strip function, strstrip(), since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses strim() instead of strstrip() to avoid false warnings. [7]

Haskell

A trim algorithm in Haskell:

 import Data.Char (isSpace)
 trim      :: String -> String
 trim      = f . f
    where f = reverse . dropWhile isSpace

may be interpreted as follows: f drops the preceding whitespace, and reverses the string. f is then again applied to its own output. Note that the type signature (the second line) is optional.

J

The trim algorithm in J is a functional description:

GeSHi Error: GeSHi could not find the language j (using path /usr/share/php-geshi/geshi/) (code 2)

You need to specify a language like this: <source lang="html4strict">...</source>

Supported languages for syntax highlighting:

abap, actionscript, actionscript3, ada, apache, applescript, apt_sources, asm, asp, autoit, avisynth, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, c_mac, caddcl, cadlisp, cfdg, cfm, cil, cmake, cobol, cpp, cpp-qt, csharp, css, d, dcs, delphi, diff, div, dos, dot, eiffel, email, erlang, fo, fortran, freebasic, genero, gettext, glsl, gml, gnuplot, groovy, haskell, hq9plus, html4strict, idl, ini, inno, intercal, io, java, java5, javascript, kixtart, klonec, klonecpp, latex, lisp, locobasic, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, make, matlab, mirc, modula3, mpasm, mxml, mysql, nsis, oberon2, objc, ocaml, ocaml-brief, oobas, oracle11, oracle8, pascal, per, perl, php, php-brief, pic16, pixelbender, plsql, povray, powershell, progress, prolog, properties, providex, python, qbasic, rails, rebol, reg, robots, ruby, sas, scala, scheme, scilab, sdlbasic, smalltalk, smarty, sql, tcl, teraterm, text, thinbasic, tsql, typoscript, vb, vbnet, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xml, xorg_conf, xpp, z80

That is: filter (#~) for non-space characters (' '&~:) between leading (+./\) and (*.) trailing (+./\.) spaces.

JavaScript

There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows:

String.prototype.trim = function() {
  return this.replace(/^\s+/g, "").replace(/\s+$/g, "");
};

Perl

Perl 5 has no built-in trim function. However, the functionality is commonly achieved using regular expressions.

Example:

$string =~ s/^\s+//;            # remove leading whitespace
$string =~ s/\s+$//;            # remove trailing whitespace

or:

$string =~ s/^\s+|\s+$//g ;     # remove both leading and trailing whitespace

These examples modify the value of the original variable $string.

Also available for Perl is StripLTSpace in String::Strip from CPAN.

There are, however, two functions that are commonly used to strip whitespace from the end of strings, chomp and chop:

In Perl 6, the upcoming major revision of the language, strings have a trim method.

Example:

$string = $string.trim;     # remove leading and trailing whitespace
$string .= trim;            # same thing

Tcl

The Tcl string command has three relevant subcommands: trim, trimright and trimleft. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove -- the default is whitespace (space, tab, newline, carriage return).

Example of trimming vowels:

set string onomatopoeia
set trimmed [string trim $string aeiou]         ;# result is nomatop
set r_trimmed [string trimright $string aeiou]  ;# result is onomatop
set l_trimmed [string trimleft $string aeiou]   ;# result is nomatopoeia

XSLT

XSLT includes the function normalize-space(string) which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space.

Example:

<xsl:variable name='trimmed'>
   <xsl:value-of select='normalize-space(string)'/>
</xsl:variable>

XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming.

Another XSLT technique for trimming is to utilize the XPath 2.0 substring() function.

See also

External links